Belief Propagation Implementation Using CUDA on an NVIDIA GTX 280
نویسندگان
چکیده
Disparity map generation is a significant component of vision-based driver assistance systems. This paper describes an efficient implementation of a belief propagation algorithm on a graphics card (GPU) using CUDA (Compute Uniform Device Architecture) that can be used to speed up stereo image processing by between 30 and 250 times. For evaluation purposes, different kinds of images have been used: reference images from the Middlebury stereo website, and real-world stereo sequences, self-recorded with the research vehicle of the .enpeda.. project at The University of Auckland. This paper provides implementation details, primarily concerned with the inequality constraints, involving the threads and shared memory, required for efficient programming on a GPU.
منابع مشابه
Optimizing and Auto-tuning Belief Propagation on the GPU
A CUDA kernel will utilize high-latency local memory for storage when there are not enough registers to hold the required data or if the data is an array that is accessed using a variable index within a loop. However, accesses from local memory take longer than accesses from registers and shared memory, so it is desirable to minimize the use of local memory. This paper contains an analysis of s...
متن کاملData-parallel Micropolygon Rasterization
We implement a tile based sort-middle rasterizer in CUDA and study its performance characteristics when used as a backend for adaptive tessellation down to micropolygons. Tessellation and bucketing map very well to the data-parallel paradigm of CUDA, and the majority of time is spent with rasterization. Despite this, our fastest implementation is able to reach 30-50% of the hardware rasterizati...
متن کاملStudy of Parallel Image Processing with the Implementation of vHGW Algorithm using CUDA on NVIDIA’S GPU Framework
This paper provides an effective study of the implementation of parallel image processing techniques using CUDA on NVIDIA GPU framework. It also discusses about the major requirements of parallelism in medical image processing techniques. Additional important aspect of this paper is to develop vHGW(van Herk/Gill-Werman morphology) algorithm intended for erosion and dilation proposed for diverse...
متن کاملECM on Graphics Cards
This paper reports record-setting performance for the ellipticcurve method of integer factorization: for example, 604.99 curves/second for ECM stage 1 with B1 = 8192 for 280-bit integers on a single PC. The state-of-the-art GMP-ECM software handles 171.42 curves/second for ECM stage 1 with B1 = 8192 for 280-bit integers using all four cores of a 2.4GHz Core 2 Quad Q6600. The extra speed takes a...
متن کاملAn approach to Improve Particle Swarm Optimization Algorithm Using CUDA
The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...
متن کامل